Skip to content

Introduce Backend Interface (DatabricksClient) #573

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 149 commits into from
May 30, 2025

Conversation

varun-edachali-dbx
Copy link
Collaborator

@varun-edachali-dbx varun-edachali-dbx commented May 26, 2025

What type of PR is this?

  • Refactor

Description

  • Introduce the DatabricksClient interface and make the existing thrift backend implement it. This allows the cursor to not be aware of the type of backend instantiated. Currently, we have to include some assertions in the ResultSet to ensure we have a ThriftDatabricksClient type, because the fetch-phase abstractions have not been implemented yet.
  • Introduce SessionId and CommandId interfaces to create a consistent adapter to be used to represent sessions and commands instead of relying on Thrift (or eventually, SEA) specific types.

How is this tested?

  • Unit tests
    • some of the existing unit tests were slightly altered to account for the introduction of the interface. No unit tests were removed or introduced.
  • E2E Tests
  • Manually
  • N/A

Related Tickets & Documents

https://docs.google.com/document/d/1Y-eXLhNqqhrMVGnOlG8sdFrCxBTN1GdQvuKG4IfHmo0/edit?usp=sharing
https://databricks.atlassian.net/browse/PECOBLR-440?atlOrigin=eyJpIjoiMTgzNGNiMDVkMGQ3NDM2Njg5OTRhZWQ1MGQ4Mjg1OWIiLCJwIjoiaiJ9

Copy link

Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase (git rebase -i main).

Copy link

Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase (git rebase -i main).

Copy link

Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase (git rebase -i main).

Signed-off-by: varun-edachali-dbx <[email protected]>
Signed-off-by: varun-edachali-dbx <[email protected]>
Copy link
Contributor

@jayantsing-db jayantsing-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for changes and most importantly changes to tests. Some minor nit comments, rest LGTM

@jayantsing-db
Copy link
Contributor

please make a note of the failing integration test in commit message when pushing if it is unrelated to these changes.

Signed-off-by: varun-edachali-dbx <[email protected]>
Signed-off-by: varun-edachali-dbx <[email protected]>
Signed-off-by: varun-edachali-dbx <[email protected]>
Signed-off-by: varun-edachali-dbx <[email protected]>
@samikshya-db
Copy link
Contributor

In the future, let's try to raise smaller size PRs typically within 700 lines of code changes including tests. This will help you get more reviews and more importantly more thorough reviews. Thanks.

@jayantsing-db
Copy link
Contributor

In the future, let's try to raise smaller size PRs typically within 700 lines of code changes including tests. This will help you get more reviews and more importantly more thorough reviews. Thanks.

Hey @samikshya-db, it's tricky to further scope this down. Otherwise, it was adding too much overhead.

@samikshya-db
Copy link
Contributor

@jayantsing-db I understand this would be harder for the initial set of SEA PRs. Even then, it is good to keep this in mind and try to break it down. Happy to brainstorm on this too.

@jayantsing-db
Copy link
Contributor

jayantsing-db commented May 30, 2025

Even then, it is good to keep this in mind and try to break it down. Happy to brainstorm on this too.

yes

I understand this would be harder for the initial set of SEA PRs

heads up, this and couple more are refactoring PRs to prepare for SEA changes. refactoring PRs are usually high in LOC because we have to make sure current set of tests do not break and so the related changes have to go in together. Even then @varun-edachali-dbx first session re-factoring PR is ~500

for SEA PR, LOC will be low as expected per PR.

@varun-edachali-dbx varun-edachali-dbx merged commit 400a8bd into sea-migration May 30, 2025
22 of 23 checks passed
varun-edachali-dbx added a commit that referenced this pull request Jun 18, 2025
NOTE: the `test_complex_types` e2e test was not working at the time of this merge. The test must be triggered when the test is back up and running as intended.

* remove excess logs, assertions, instantiations

large merge artifacts

Signed-off-by: varun-edachali-dbx <[email protected]>

* formatting (black) + remove excess log (merge artifact)

Signed-off-by: varun-edachali-dbx <[email protected]>

* fix typing

Signed-off-by: varun-edachali-dbx <[email protected]>

* remove un-necessary check

Signed-off-by: varun-edachali-dbx <[email protected]>

* remove un-necessary replace call

Signed-off-by: varun-edachali-dbx <[email protected]>

* introduce __str__ methods for CommandId and SessionId

Signed-off-by: varun-edachali-dbx <[email protected]>

* docstrings for DatabricksClient interface

Signed-off-by: varun-edachali-dbx <[email protected]>

* stronger typing of Cursor and ExecuteResponse

Signed-off-by: varun-edachali-dbx <[email protected]>

* remove utility functions from backend interface, fix circular import

Signed-off-by: varun-edachali-dbx <[email protected]>

* rename info to properties

Signed-off-by: varun-edachali-dbx <[email protected]>

* newline for cleanliness

Signed-off-by: varun-edachali-dbx <[email protected]>

* fix circular import

Signed-off-by: varun-edachali-dbx <[email protected]>

* formatting (black)

Signed-off-by: varun-edachali-dbx <[email protected]>

* to_hex_id -> get_hex_id

Signed-off-by: varun-edachali-dbx <[email protected]>

* better comment on protocol version getter

Signed-off-by: varun-edachali-dbx <[email protected]>

* formatting (black)

Signed-off-by: varun-edachali-dbx <[email protected]>

* move guid to hex id to new utils module

Signed-off-by: varun-edachali-dbx <[email protected]>

* formatting (black)

Signed-off-by: varun-edachali-dbx <[email protected]>

* move staging allowed local path to connection props

Signed-off-by: varun-edachali-dbx <[email protected]>

* add strong return type for execute_command

Signed-off-by: varun-edachali-dbx <[email protected]>

* skip auth, error handling in databricksclient interface

Signed-off-by: varun-edachali-dbx <[email protected]>

* chore: docstring + line width

Signed-off-by: varun-edachali-dbx <[email protected]>

* get_id -> get_guid

Signed-off-by: varun-edachali-dbx <[email protected]>

* chore: docstring

Signed-off-by: varun-edachali-dbx <[email protected]>

* fix: to_hex_id -> to_hex_guid

Signed-off-by: varun-edachali-dbx <[email protected]>

---------
Signed-off-by: varun-edachali-dbx <[email protected]>
varun-edachali-dbx added a commit that referenced this pull request Jun 18, 2025
NOTE: the `test_complex_types` e2e test was not working at the time of this merge. The test must be triggered when the test is back up and running as intended. 

* remove excess logs, assertions, instantiations

large merge artifacts

Signed-off-by: varun-edachali-dbx <[email protected]>

* formatting (black) + remove excess log (merge artifact)

Signed-off-by: varun-edachali-dbx <[email protected]>

* fix typing

Signed-off-by: varun-edachali-dbx <[email protected]>

* remove un-necessary check

Signed-off-by: varun-edachali-dbx <[email protected]>

* remove un-necessary replace call

Signed-off-by: varun-edachali-dbx <[email protected]>

* introduce __str__ methods for CommandId and SessionId

Signed-off-by: varun-edachali-dbx <[email protected]>

* docstrings for DatabricksClient interface

Signed-off-by: varun-edachali-dbx <[email protected]>

* stronger typing of Cursor and ExecuteResponse

Signed-off-by: varun-edachali-dbx <[email protected]>

* remove utility functions from backend interface, fix circular import

Signed-off-by: varun-edachali-dbx <[email protected]>

* rename info to properties

Signed-off-by: varun-edachali-dbx <[email protected]>

* newline for cleanliness

Signed-off-by: varun-edachali-dbx <[email protected]>

* fix circular import

Signed-off-by: varun-edachali-dbx <[email protected]>

* formatting (black)

Signed-off-by: varun-edachali-dbx <[email protected]>

* to_hex_id -> get_hex_id

Signed-off-by: varun-edachali-dbx <[email protected]>

* better comment on protocol version getter

Signed-off-by: varun-edachali-dbx <[email protected]>

* formatting (black)

Signed-off-by: varun-edachali-dbx <[email protected]>

* move guid to hex id to new utils module

Signed-off-by: varun-edachali-dbx <[email protected]>

* formatting (black)

Signed-off-by: varun-edachali-dbx <[email protected]>

* move staging allowed local path to connection props

Signed-off-by: varun-edachali-dbx <[email protected]>

* add strong return type for execute_command

Signed-off-by: varun-edachali-dbx <[email protected]>

* skip auth, error handling in databricksclient interface

Signed-off-by: varun-edachali-dbx <[email protected]>

* chore: docstring + line width

Signed-off-by: varun-edachali-dbx <[email protected]>

* get_id -> get_guid

Signed-off-by: varun-edachali-dbx <[email protected]>

* chore: docstring

Signed-off-by: varun-edachali-dbx <[email protected]>

* fix: to_hex_id -> to_hex_guid

Signed-off-by: varun-edachali-dbx <[email protected]>

---------
Signed-off-by: varun-edachali-dbx <[email protected]>
@varun-edachali-dbx varun-edachali-dbx mentioned this pull request Jun 18, 2025
5 tasks
varun-edachali-dbx added a commit that referenced this pull request Jul 15, 2025
* Separate Session related functionality from Connection class (#571)

* decouple session class from existing Connection

ensure maintenance of current APIs of Connection while delegating
responsibility

Signed-off-by: varun-edachali-dbx <[email protected]>

* add open property to Connection to ensure maintenance of existing API

Signed-off-by: varun-edachali-dbx <[email protected]>

* update unit tests to address ThriftBackend through session instead of through Connection

Signed-off-by: varun-edachali-dbx <[email protected]>

* chore: move session specific tests from test_client to test_session

Signed-off-by: varun-edachali-dbx <[email protected]>

* formatting (black)

as in CONTRIBUTING.md

Signed-off-by: varun-edachali-dbx <[email protected]>

* use connection open property instead of long chain through session

Signed-off-by: varun-edachali-dbx <[email protected]>

* trigger integration workflow

Signed-off-by: varun-edachali-dbx <[email protected]>

* fix: ensure open attribute of Connection never fails

in case the openSession takes long, the initialisation of the session
will not complete immediately. This could make the session attribute
inaccessible. If the Connection is deleted in this time, the open()
check will throw because the session attribute does not exist. Thus, we
default to the Connection being closed in this case. This was not an
issue before because open was a direct attribute of the Connection
class. Caught in the integration tests.

Signed-off-by: varun-edachali-dbx <[email protected]>

* fix: de-complicate earlier connection open logic

earlier, one of the integration tests was failing because 'session was
not an attribute of Connection'. This is likely tied to a local
configuration issue related to unittest that was causing an error in the
test suite itself. The tests are now passing without checking for the
session attribute.
c676f9b

Signed-off-by: varun-edachali-dbx <[email protected]>

* Revert "fix: de-complicate earlier connection open logic"

This reverts commit d6b1b19.

Signed-off-by: varun-edachali-dbx <[email protected]>

* [empty commit] attempt to trigger ci e2e workflow

Signed-off-by: varun-edachali-dbx <[email protected]>

* Update CODEOWNERS (#562)

new codeowners

Signed-off-by: varun-edachali-dbx <[email protected]>

* Enhance Cursor close handling and context manager exception management to prevent server side resource leaks (#554)

* Enhance Cursor close handling and context manager exception management

* tests

* fmt

* Fix Cursor.close() to properly handle CursorAlreadyClosedError

* Remove specific test message from Cursor.close() error handling

* Improve error handling in connection and cursor context managers to ensure proper closure during exceptions, including KeyboardInterrupt. Add tests for nested cursor management and verify operation closure on server-side errors.

* add

* add

Signed-off-by: varun-edachali-dbx <[email protected]>

* PECOBLR-86 improve logging on python driver (#556)

* PECOBLR-86 Improve logging for debug level

Signed-off-by: Sai Shree Pradhan <[email protected]>

* PECOBLR-86 Improve logging for debug level

Signed-off-by: Sai Shree Pradhan <[email protected]>

* fixed format

Signed-off-by: Sai Shree Pradhan <[email protected]>

* used lazy logging

Signed-off-by: Sai Shree Pradhan <[email protected]>

* changed debug to error logs

Signed-off-by: Sai Shree Pradhan <[email protected]>

* used lazy logging

Signed-off-by: Sai Shree Pradhan <[email protected]>

---------

Signed-off-by: Sai Shree Pradhan <[email protected]>
Signed-off-by: varun-edachali-dbx <[email protected]>

* Revert "Merge remote-tracking branch 'upstream/sea-migration' into decouple-session"

This reverts commit dbb2ec5, reversing
changes made to 7192f11.

Signed-off-by: varun-edachali-dbx <[email protected]>

* Reapply "Merge remote-tracking branch 'upstream/sea-migration' into decouple-session"

This reverts commit bdb8381.

Signed-off-by: varun-edachali-dbx <[email protected]>

* fix: separate session opening logic from instantiation

ensures correctness of self.session.open call in Connection

Signed-off-by: varun-edachali-dbx <[email protected]>

* fix: use is_open attribute to denote session availability

Signed-off-by: varun-edachali-dbx <[email protected]>

* fix: access thrift backend through session

Signed-off-by: varun-edachali-dbx <[email protected]>

* chore: use get_handle() instead of private session attribute in client

Signed-off-by: varun-edachali-dbx <[email protected]>

* formatting (black)

Signed-off-by: varun-edachali-dbx <[email protected]>

* fix: remove accidentally removed assertions

Signed-off-by: varun-edachali-dbx <[email protected]>

---------

Signed-off-by: varun-edachali-dbx <[email protected]>
Signed-off-by: Sai Shree Pradhan <[email protected]>
Co-authored-by: Jothi Prakash <[email protected]>
Co-authored-by: Madhav Sainanee <[email protected]>
Co-authored-by: Sai Shree Pradhan <[email protected]>

* Introduce Backend Interface (DatabricksClient) (#573)

NOTE: the `test_complex_types` e2e test was not working at the time of this merge. The test must be triggered when the test is back up and running as intended. 

* remove excess logs, assertions, instantiations

large merge artifacts

Signed-off-by: varun-edachali-dbx <[email protected]>

* formatting (black) + remove excess log (merge artifact)

Signed-off-by: varun-edachali-dbx <[email protected]>

* fix typing

Signed-off-by: varun-edachali-dbx <[email protected]>

* remove un-necessary check

Signed-off-by: varun-edachali-dbx <[email protected]>

* remove un-necessary replace call

Signed-off-by: varun-edachali-dbx <[email protected]>

* introduce __str__ methods for CommandId and SessionId

Signed-off-by: varun-edachali-dbx <[email protected]>

* docstrings for DatabricksClient interface

Signed-off-by: varun-edachali-dbx <[email protected]>

* stronger typing of Cursor and ExecuteResponse

Signed-off-by: varun-edachali-dbx <[email protected]>

* remove utility functions from backend interface, fix circular import

Signed-off-by: varun-edachali-dbx <[email protected]>

* rename info to properties

Signed-off-by: varun-edachali-dbx <[email protected]>

* newline for cleanliness

Signed-off-by: varun-edachali-dbx <[email protected]>

* fix circular import

Signed-off-by: varun-edachali-dbx <[email protected]>

* formatting (black)

Signed-off-by: varun-edachali-dbx <[email protected]>

* to_hex_id -> get_hex_id

Signed-off-by: varun-edachali-dbx <[email protected]>

* better comment on protocol version getter

Signed-off-by: varun-edachali-dbx <[email protected]>

* formatting (black)

Signed-off-by: varun-edachali-dbx <[email protected]>

* move guid to hex id to new utils module

Signed-off-by: varun-edachali-dbx <[email protected]>

* formatting (black)

Signed-off-by: varun-edachali-dbx <[email protected]>

* move staging allowed local path to connection props

Signed-off-by: varun-edachali-dbx <[email protected]>

* add strong return type for execute_command

Signed-off-by: varun-edachali-dbx <[email protected]>

* skip auth, error handling in databricksclient interface

Signed-off-by: varun-edachali-dbx <[email protected]>

* chore: docstring + line width

Signed-off-by: varun-edachali-dbx <[email protected]>

* get_id -> get_guid

Signed-off-by: varun-edachali-dbx <[email protected]>

* chore: docstring

Signed-off-by: varun-edachali-dbx <[email protected]>

* fix: to_hex_id -> to_hex_guid

Signed-off-by: varun-edachali-dbx <[email protected]>

---------
Signed-off-by: varun-edachali-dbx <[email protected]>

* Implement ResultSet Abstraction (backend interfaces for fetch phase) (#574)

* ensure backend client returns a ResultSet type in backend tests

Signed-off-by: varun-edachali-dbx <[email protected]>

* formatting (black)

Signed-off-by: varun-edachali-dbx <[email protected]>

* newline for cleanliness

Signed-off-by: varun-edachali-dbx <[email protected]>

* fix circular import

Signed-off-by: varun-edachali-dbx <[email protected]>

* formatting (black)

Signed-off-by: varun-edachali-dbx <[email protected]>

* to_hex_id -> get_hex_id

Signed-off-by: varun-edachali-dbx <[email protected]>

* better comment on protocol version getter

Signed-off-by: varun-edachali-dbx <[email protected]>

* formatting (black)

Signed-off-by: varun-edachali-dbx <[email protected]>

* stricter typing for cursor

Signed-off-by: varun-edachali-dbx <[email protected]>

* correct typing

Signed-off-by: varun-edachali-dbx <[email protected]>

* correct tests and merge artifacts

Signed-off-by: varun-edachali-dbx <[email protected]>

* remove accidentally modified workflow files

remnants of old merge

Signed-off-by: varun-edachali-dbx <[email protected]>

* chore: remove accidentally modified workflow files

Signed-off-by: varun-edachali-dbx <[email protected]>

* add back accidentally removed docstrings

Signed-off-by: varun-edachali-dbx <[email protected]>

* clean up docstrings

Signed-off-by: varun-edachali-dbx <[email protected]>

* log hex

Signed-off-by: varun-edachali-dbx <[email protected]>

* remove unnecessary _replace call

Signed-off-by: varun-edachali-dbx <[email protected]>

* add __str__ for CommandId

Signed-off-by: varun-edachali-dbx <[email protected]>

* take TOpenSessionResp in get_protocol_version to maintain existing interface

Signed-off-by: varun-edachali-dbx <[email protected]>

* active_op_handle -> active_mmand_id

Signed-off-by: varun-edachali-dbx <[email protected]>

* ensure None returned for close_command

Signed-off-by: varun-edachali-dbx <[email protected]>

* account for ResultSet return in new pydocs

Signed-off-by: varun-edachali-dbx <[email protected]>

* pydoc for types

Signed-off-by: varun-edachali-dbx <[email protected]>

* move common state to ResultSet aprent

Signed-off-by: varun-edachali-dbx <[email protected]>

* stronger typing in resultSet behaviour

Signed-off-by: varun-edachali-dbx <[email protected]>

* remove redundant patch in test

Signed-off-by: varun-edachali-dbx <[email protected]>

* add has_been_closed_server_side assertion

Signed-off-by: varun-edachali-dbx <[email protected]>

* remove redundancies in tests

Signed-off-by: varun-edachali-dbx <[email protected]>

* more robust close check

Signed-off-by: varun-edachali-dbx <[email protected]>

* use normalised state in e2e test

Signed-off-by: varun-edachali-dbx <[email protected]>

* simplify corrected test

Signed-off-by: varun-edachali-dbx <[email protected]>

* add line gaps after multi-line pydocs for consistency

Signed-off-by: varun-edachali-dbx <[email protected]>

* use normalised CommandState type in ExecuteResponse

Signed-off-by: varun-edachali-dbx <[email protected]>

---------

Signed-off-by: varun-edachali-dbx <[email protected]>

* remove un-necessary initialisation assertions

Signed-off-by: varun-edachali-dbx <[email protected]>

* remove un-necessary line break s

Signed-off-by: varun-edachali-dbx <[email protected]>

* more un-necessary line breaks

Signed-off-by: varun-edachali-dbx <[email protected]>

* constrain diff of test_closing_connection_closes_commands

Signed-off-by: varun-edachali-dbx <[email protected]>

* reduce diff of test_closing_connection_closes_commands

Signed-off-by: varun-edachali-dbx <[email protected]>

* use pytest-like assertions for test_closing_connection_closes_commands

Signed-off-by: varun-edachali-dbx <[email protected]>

* ensure command_id is not None

Signed-off-by: varun-edachali-dbx <[email protected]>

* line breaks after multi-line pyfocs

Signed-off-by: varun-edachali-dbx <[email protected]>

* ensure non null operationHandle for commandId creation

Signed-off-by: varun-edachali-dbx <[email protected]>

* use command_id methods instead of explicit guid_to_hex_id conversion

Signed-off-by: varun-edachali-dbx <[email protected]>

* remove un-necessary artifacts in test_session, add back assertion

Signed-off-by: varun-edachali-dbx <[email protected]>

* add from __future__ import annotations to remove string literals around forward refs, remove some unused imports

Signed-off-by: varun-edachali-dbx <[email protected]>

* move docstring of DatabricksClient within class

Signed-off-by: varun-edachali-dbx <[email protected]>

* move ThriftResultSet import to top of file

Signed-off-by: varun-edachali-dbx <[email protected]>

* make backend/utils __init__ file empty

Signed-off-by: varun-edachali-dbx <[email protected]>

* use from __future__ import annotations to remove string literals around Cursor

Signed-off-by: varun-edachali-dbx <[email protected]>

* use lazy logging

Signed-off-by: varun-edachali-dbx <[email protected]>

* replace getters with property tag

Signed-off-by: varun-edachali-dbx <[email protected]>

* set active_command_id to None, not active_op_handle

Signed-off-by: varun-edachali-dbx <[email protected]>

* align test_session with pytest instead of unittest

Signed-off-by: varun-edachali-dbx <[email protected]>

* formatting (black)

Signed-off-by: varun-edachali-dbx <[email protected]>

* remove repetition from Session.__init__

Signed-off-by: varun-edachali-dbx <[email protected]>

* mention that if catalog / schema name is None, we fetch across all

Signed-off-by: varun-edachali-dbx <[email protected]>

* mention fetching across all tables if null table name

Signed-off-by: varun-edachali-dbx <[email protected]>

* remove lazy import of ThriftResultSet

Signed-off-by: varun-edachali-dbx <[email protected]>

* remove unused import

Signed-off-by: varun-edachali-dbx <[email protected]>

* better docstrings

Signed-off-by: varun-edachali-dbx <[email protected]>

* clarified role of cursor in docstring

Signed-off-by: varun-edachali-dbx <[email protected]>

---------

Signed-off-by: varun-edachali-dbx <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.